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Abstract 

The relation between nonanticipative rate distortion function (RDF) and filtering theory is discussed 
on abstract spaces. The relation is established by imposing a realizability constraint on the reconstruction 
conditional distribution of the classical RDF. Existence of the extremum solution of the nonanticipative 
RDF is shown using weak* -convergence on appropriate topology. The extremum reconstruction con- 
ditional distribution is derived in closed form, for the case of stationary processes. The realization of 
the reconstruction conditional distribution which achieves the infimum of the nonanticipative RDF is 
described. Finally, an example is presented to illustrate the concepts. 

Index Terms 

non-anticipative rate distortion function (RDF), filtering, realization, weak* -convergence, optimal 
reconstruction kernel 

I. INTRODUCTION 

Shannon's information theory for reliable communication evolved over the years without much 
emphasis on nonanticipation imposed on the communication sub-systems. In particular, the clas- 
sical rate distortion function (RDF) for source data compression deals with the characterization of 
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the optimal reconstruction conditional distribution subject to a fidelity criterion 0], ED, without 
regard to nonanticipation. 

On the other hand, filtering theory is developed by imposing real-time realizability on the 
estimators with respect to measurement data. Although, both reliable communication and filtering 
(state estimation for control) are concerned with reconstruction of processes, the main underlying 
assumptions characterizing them are different. 

In this paper, the intersection of rate distortion function (RDF) and real-time realizable filtering 
theory is established by invoking a nonanticipative constraint on the reconstruction kernel to be 
realizable via real-time operations, while the optimal nonanticipative reconstruction kernel is 
derived. Consequently, the connection between nonanticipative RDF, its characterization via the 
optimal reconstruction kernel, and real-time realizable filtering theory is established under very 
general conditions on the source (including Markov sources). 

The fundamental advantage of the new filtering approach based on nonanticipative RDF, is 
the ability to ensure average or probabilistic estimation error constraints, which is non-trivial 
task if Bayesian filtering techniques are employed to formulate such constraints. The motivations 
includes nonanticipative data compression over noisy channels, such as control over networks, 
where the controlled system and controller may be connected via a noisy channel 0, 0, 
ED, [0, 0, El. In such applications, filtering via nonanticipative RDF approximates sensor 
measurements by the reconstruction process taking values in a set of smaller cardinality, while 
the approximation is quantified by the distortion function. Given the recent interest in developing 
controller and estimator architectures processing quantized information, nonanticipative RDF can 
deal with constructing estimators with a prescribed accuracy. 

The first relation between information theory and filtering via distortion rate function is 
discussed by R. S. Bucy in BU, by carrying out the computation of a realizable (nonanticipative) 
distortion rate function with square criteria for two samples of the Ornstein-Uhlenbeck Gaussian 
process. Related work on nonanticipative rate distortion theory is pursued by A. K. Gorbunov and 
M. S. Pinsker in ifTOll . ifTTll . Specifically, |[T0l discussed nonanticipative RDF for general stationary 
processes and establishes existence of the infinite horizon limit, while |[TT| computes a closed 
form expression for nonanticipative RDF (called e-entropy) for stationary Gaussian processes 
using power spectral methods. Further elaborations on the similarities and differences between 
E3, IfTOll . [fTT| | and this paper will be discussed in subsequent parts of the paper. Moreover, over 
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the years several papers appeared in the literature in which controller or estimator are designed 
based on information theoretic measures lfT2l . [fT3l . [fT4l . An earlier work designing filters via 
information theoretic measures is lfl5l . while lfT6ll analyzes mutual information for Gaussian 
processes. 

In this paper, the connection between nonanticipative rate distortion theory and filtering theory 
is further examined, under a nonanticipative condition defined by the family of conditional 
distributions (reconstructions), for general distortion functions and random processes on abstract 
Polish spaces. The connection is established via optimization on the space of conditional distri- 
butions with average distortion constraint and almost sure (a.s.) constraints to account for the 
nonanticipative condition on the reconstruction conditional distribution. The main results are the 
following. 

(1) Existence of the nonanticipative RDF using the topology of weak* -convergence; 

(2) Closed form expression for reconstruction conditional distribution minimizing the nonan- 
ticipative RDF for stationary processes; 

(3) Realization procedure of the filter based on the nonanticipative RDF; 

(4) Example to demonstrate the realization of the filter. 

It is important to point out that items (1) — (4) above are not addressed in the related papers 



Next,we give a high level discussion on Bayesian filtering theory and nonanticipative RDF, and 
we present some aspects of the problem pursued in this paper. Consider a discrete-time process 
X n = {X ,X 1 ,...,X n } e X , n = ><i= Xh and its reconstruction Y n = {Y ,Y U . . . ,Y n } E 
yo,n — x r=o3^, where Xi and are Polish spaces (complete separable metric spaces). The 
objective is to reconstruct X n by Y n via nonanticipative operations subject to a distortion or 
fidelity criterion. That is, for each i = 0, 1, . . ., the reconstruction Yj of Xi should depend on 
past and present information {X , Y , X 1; Yi, . . . , Yi_i, Xi}. Once this mapping is found a 
procedure is introduced to realize filter of Yi from auxiliarjf] measurements. 




m, ma, mm. 




This point is explained in Subsection I-B 
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A. Bayesian Estimation Theory 

In classical filtering, one is given a mathematical model that generates the process X n , via 
its conditional distribution {Pxax*- 1 (dxi\x l ~ l ) : % — 0, 1, . . . , n} or via discrete-time recursive 
dynamics, a mathematical model that generates observed data obtained from sensors, say, Z n , 
{Pz^z*- 1 ,x* [dzi\z l ~ x , x % ) : i — 0, 1, . . . , n}, while F n are the causal estimates of some function 
of the process X n based on the observed data Z n . Note that for a memoryless channel that 
generates the observation sequence {Zi : i = 0, 1, . . . , n} then Pz i \z i - 1 ,x i (dzi\z' l ~ 1 , x l ) = 
Pz t \xAdzi\xi) - a.s., i = 0, 1, . . . , n. 

In Bayesian estimation one is interested in causal estimators of some function $ : X n \ — > M, 
Y n = $(X n ) based on the observed data Z 11 ' 1 = {Z , Z\, . . . , Z„_i}. With respect to minimizing 
the least-squares error pay-off, the best estimate of given denoted by is given 

by the conditional mean 

$(Xi) ^e{$(X,)|Z^ 1 } = jT ^^Px^,-!^!^- 1 ), i = 0,l,. ..,n. 

For non-linear problems, Bayesian filtering is often addressed via the conditional distribution 
{Px i \z i - 1 (dxi\z l ~ 1 ) : i = 0,1,..., n} or its unnormalized versions which satisfy discrete- 
recursions IfTTTl . and forms a sufficient statistic for the filtering problem. 

Consider the simplified example of the multi-dimensional Gaussian-Markov processes modeled 
by 

' X k+1 = AX k + BW k , X ~iV(0; k = 0, 1, . . . , n - 1 

Z fc = CX fc + W fc , fc = 0,l,...,n 
where {A, _B, C, -D} have appropriate dimensions, iy fe ~A^(0; S W)fe ) (Gaussian with mean zero and 
covariance T, Wk ), V k ~N(0; H Vh )■> k = 0, 1, . . . , n, while the processes {W k : k = 0, 1, . . . , n — 
l},{V k : k = 0,1,..., n} are mutually independent, and independent of X . The classical 
Kalman Filter is a well-known example for which the optimal reconstruction Xj = E^lZ* -1 ], i = 
0, 1, . . . ,n, is the conditional mean which minimizes the average least-squares estimation error. 
Thus, in classical filtering theory both models which generate the unobserved and observed 
processes, X n and Z n , respectively, are given a priori, and the estimator X { is a nonanticipative 



function of the past information Z l 1 , i = 0,1,..., n. Fig. 1.1 illustrates the cascade block 
diagram of the Bayesian filtering problem. 
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Fig. LI. Block Diagram of Bayesian Filtering Problem 



B. Nonanticipative Rate Distortion Theory and Estimation 

In nonanticipative rate distortion theory one is given the process X n , which induces the 



conditional distributions {Pxax 1 - 1 

(d. 



0,1, ... ,n} and determines the nonan- 



ticipative reconstruction conditional distribution {PYAY^^idy^y 



i-l 



X 



0,1, 



n} 



which minimizes the mutual information between X n and Y n subject to a distortion or fidelity 
constraint, via a nonanticipative or realizability constraint. The filter {Yj : i = 0, 1, . . . , n} of 
{Xi : i — 0, 1, . . . , n} is found by realizing the reconstruction distribution {Py.iyi-^x^c/^/ilj/ 1-1 , 
x*) : z = 0, l,...,n} via a cascade of sub-systems as shown in Fig. 1.2 The point to be 



x t ,x v .. 



Sensor Map 



7 7 



i 5 ,, 

X, \z' 1 



Optimal 
Reconstruction 



Fig. 1.2. Block Diagram of Filtering via Nonanticipative Rate Distortion Function 



made here is that the auxiliary random sequence {Z , Zi, . . .} which is the analog of sensor 
measurements (in the above discussion of Bayesian estimation) is identified during the realization 
of the optimal reconstruction distribution {Py^y*- 1 ^^dy^y 1 " 1 , x % ) : i = 0, 1, . . . ,n}. Thus, in 
Bayesian estimation, the sensor map is given a priori, while in nonanticipative rate distortion 
theory, this map is identified during the realization of the optimal reconstruction conditional 
distribution {Py^Y*- 1 ,x^{dyi\y l ~ x , x % ) : i = 0, 1, . . . ,n}, so that the end-to-end nonanticipative 
RDF from X n to Y n is achieved. 
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The precise problem formulation of nonanticipative RDF is defined by first introducing the 
distortion or fidelity constraint and mutual information. 

The distortion function or fidelity constraint between x n and its reconstruction y n , is a measurable 
function defined by 

n 

d 0>n ■ Xo,n x y , n H> [0,oo], d , n (x n ,y n ) = ^p ,i(<2/ i )- 

For single letter distortion d , n (x n ,y n ) = X^ILo P( x i' 2/0' an ^ for single letter square error 
distortion d 0tn (x n ,y n ) = Ym=q \ \ x i ~ Vi\Y '■ Moreover, for finite alphabet spaces and 3^, the 
distortion function can be defined in terms of the Hamming distance. 

The mutual information between X n and Y n , for a given distribution P x ^(dx n ), and conditional 
distribution P Y ™\x n {dy n \x n ), is defined by 

I(X n ;Y n ) ^ f log ( PY T[ d fJf ] )Py^xAdylxn ® Px»(dx n ). (1.2) 

Next, introduce the nonanticipative constraint on the reconstruction distribution. To this end, 
define the (n + 1)— fold nonanticipative convolution measure 

f Y n lxn (dy n \x n ) 4 ^Py^-i^idy^- 1 ,^) - a.s. (1.3) 

The set of nonanticipative reconstruction distributions is defined by 

^ ad = {Py»\x»(dy n \x n ) : *V»|x»(d!/ n b n ) = f Y n ]xn (dy n \x n ) - a.s.y (1.4) 

Note that without the nonanticipative constraint specified by <^ ad , the connection between 
filtering theory and rate distortion theory cannot be established, since in general by Bayes' rule 

P Y n\x"(dy n \x n ) = ®" =0 -Py i |r i - 1 ,x' i (^2/i|2/* _1 5 xU ) ~ a - s -> an d hence, for each i — 0, 1, . . . ,n, the 
conditional distribution Py.iy<-i ,x n ('K ') °f ^ w iU depend on future symbols {X i+ i, X i+2 , . . . , X n } 
in addition to the past and present symbols {Y^ 1 , X 1 }. However, by imposing the nonanticipative 



constraint (1.4), then at each time instant i — 0,1, ... , the reconstruction Y~j of Xi will depend 
on the past reconstructions {Y , . . . , Fj-i} and past and present symbols {X , . . . , Xi}. 
Nonanticipative Distortion Rate Function. The nonanticipative distortion rate function is defined 
by the minimization over Py n \x n (dy n \x n ) of the average distortion function subject to a constraint 
on the mutual information rate I(X n ; Y n ) < R and the nonanticipative constraint (1.4) as follows. 

D c 0>n {R)± inf E\d 0in (X n ,Y n )\. (1.5) 

P Y n\x«(dy n \x")£Q ad :I(X";Y n )<R ^ J 
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The classical distortion rate function does not imposes the nonanticipative constraint P Y n\x^{dy n \ 
x n ) = ~F^Y™\x n (dy n \x n ) — a.s., hence the resulting optimal reconstruction distribution of symbol 
Ui will depend on (y 



I—I rjA 



and on future symbols (xj+i, . . . ,x n ). Thus, by solving (1.5) and 



then realizing the conditional distribution the optimal causal filter will be defined. 



At this stage it is important to point out that the nonanticipative condition (1.4) is different 
from the realizability condition in [9], in which is assumed that Y, L is independent of X*u = 
, j = i + 1, i + 2, . . . ,. Moreover, the nonanticipative condition (1.4 ) is implied by 



x,-e(xax-' 



the nonanticipative condition found in IflOl . flTTj, defined by X^ +1 -H- X n <H- Y n forms a Markov 
chain for any n = 0, 1, . . . (e.g., P Y n\x",x™ +1 (dy n \x n , x™ +1 ) = P Y n\ X "(dy n \x n ), n = 0, 1, . . .). 



The claim here is that the nonanticipative condition (1.4) is more natural and applies to processes 
which are not necessarily Gaussian with square error distortion function. 



Nonanticipative Rate Distortion Function. An equivalent problem to (1.5) is the nonanticipative 
RDF defined by 



inf 



P Y n }x n{dy"\x")^ ad : E{d , n {X"X n )<D} 



I(X n -Y n ). 



(1.6) 



The two problems defined by (1.5) and (1.6) are equivalent in the sense that the solution of (1.5) 



gives that of (1.6) and vice- versa [2J. Moreover, it can be shown that 



Y n \X' 



..(dy n \x n ) 



Y^\x^{dy n \x r - 



a.s.< 



I(X n ; Y r 



(1.7) 



where the notation I(Px 

n, ~P Y n \X n ) IS used to point out the functional dependence of I(X n ; Y n ) 
The paper is organized as follows. Section In] discusses the problem formulation on abstract 



spaces. Section III establishes existence of optimal minimizing kernel, and Section IV derives 
the stationary solution. Section IV] describes the real-time realization of nonanticipative RDF. 



Finally, Section VI demonstrates the filter realization via an example. 
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II. Formulation of Nonanticipative Rate Distortion Function on Abstract 

Spaces 

Let N n = {0,1,..., n}, n e N = {0,1,2,...}. The source and reconstruction alphabets, 
respectively, are sequences of Polish spaces {X t : t G N} and {y t : t G N}, associated 
with their corresponding measurable spaces (X t ,B(X t )) and (yt,B(y t )), t G N. Sequences 
of alphabets are identified with the product spaces (A" 0)n , B(X 0jJl )) = x k=0 (X k , B(X k )), and 
CVo,ri, ^(3^o,n)) — Xfc=o(iVfe, S(iVfc)). The source and reconstruction are random processes denoted 
by X n = {X t :te W 1 }, X : {t} x ft ^ X t , and by Y n = {Y t : t E W 1 }, Y : {t} x ft H> X, 
respectively. Probability measures on any measurable space (Z,B(Z)) are denoted by M.\(Z). 
The reconstruction conditional distribution will be defined via stochastic kernels. 

Definition II.l. Let (X,B(X)), (y,B(y)) be measurable spaces in which y is a Polish Space. 
A stochastic Kernel on y given X is a mapping q : B(y) x X — > [0, 1] satisfying the following 
two properties: 

(1) For every x e X, the set function <?(•; x) is a probability measure (possibly finitely additive) 
on B{y); 

(2) For every F e B(y), the function q(F; ■) is B(X) -measurable. 
The set of all such stochastic Kernels is denoted by Q(y~, X). 

An important notion which is used in nonanticipative RDF is conditional independence. The 
random variable (RV) Z is called conditional independent of RV X given the RV Y if and 
only if X <H> Y -H- Z forms a Markov chain in both directions, equivalently Px,z\y(dx, dz\y) = 
P x \Y(dx\y)P z \ Y (dz\y) - a.s., equivalently P z \ Y ,x(dz\y, x) = P z \ Y {dz\y) - a.s.. 

Stochastic kernels can be used to define anticipative and nonanticipative convolution of re- 
construction kernels and associated classical and nonanticipative RDF. 

Definition II.2. Given measurable spaces (X 0jn ,B(X 0tn )), (3^o,n> B(3^o,n)). an d their product 
spaces, data compression channels are classified as follows. 

1) An Anticipative Data Compression Channel is a stochastic kernel qo, n (dy n ; x n ) G Q(yo, n ] Xo,n) 
that admits a factorization into a sequence of stochastic kernels as follows 

q ,n(dy n ; x n ) = ®"=o%(%; ?T\ % n ) - a.s., (II.8) 
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where q i (dy i ;y i V) ^ Q^^.-i x ^ n ),i = 0,...,n, n G N. 

2) A Nonanticipative Convolution Data Compression Channel is a convolution of a sequence 
of nonanticipative stochastic kernels defined by 

t 0in (dy n ; x n ) 4 ®to?i(%; y*" 1 , ^) - (119) 

w/ier<? q^dy^y^.x 1 ) G QQ^; y ,i-i x <*o,i),« = 0,...,ra, nGN. 

3) A Restricted Nonanticipative Data Compression Channel is a stochastic kernel qo tTl (dy n ; x n ) 
G Q(yo,n] «^b,n) which is a convolution of a sequence of nonanticipative stochastic kernels 
via the almost sure (a.s.) constraint defined by 



qo,n{dy n ; % n ) = ®?=oQi(dyu y { \<) - a.s. 
where q t G 2(3^; ^o.i-i x <*b,t), i = 0, . . . , n, n G N. 



(11.10) 



As stated earlier, the classical RDF is concerned with optimizing (1.2) with respect to anticipative 



stochastic kernels (II.8). This paper will address problem (1.5) or (1.6), e.g., when the conditional 



distribution (stochastic kernel) is restricted nonanticipative. The generalizations of (1.5) or (1.6) 



to nonanticipative convolution stochastic kernels (II.9) are discussed in the following remark 



Remark II.3. The nonanticipative distortion rate function and the nonanticipative RDF can be 
generalized as follows. 

%n(R) = i^ y „ |x „ (d2/ ^. ):/( ^->^)<^{^o,n(X«,F«)} (11.11) 

X,n( D ) = in ^ yn|x „( d ,n| s n ):£{d0in(X „ jy „ ) < C} -> Y») (11.12) 

where I(X n — > Y n ) is the directed information measure from X n to Y n defined by 



I(X n -> Y n ) 



= Ix n — >y n ( P x n \Y n ~ 1 ' P 



H P Y 4dv n ) J 



Y"{dy r 

Y n \X r 



Y n\x«(dy n \x n ) g) P x n \Y n - 1 {dx n \y 



n I „,n— 1\ 



and 



P x n \Y n ~ l 

(d. 



(dxi\x l , y % ) - a.s. 



Clearly, (11.11) and (11.12) do not assume Px i \x i ~ l ,Y i -' i -{dxi\x % 1 ,y l x ) = Px^x^ix^x 1 x ) — 
a.s., and hence the process X n can be affected by Y n causally. This generalization covers 
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conditionally Gaussian sources as a special case / Tiffl/ . These generalizations will be investigated 
elsewhere, since they will require new topological spaces on which existence of optimal solution 
to ( 11.11 ) and ( II.12\ can be shown. 



A. Nonanticipative RDF 

In this subsection the nonanticipative RDF is rigorously defined on abstract spaces. Given 
a source probability measure /i , n £ Aii(Xo,n) (possibly finite additive) and a reconstruction 
kernel g ,n £ Q(X>,n; <*o,n)> one can define three probability measures as follows. 

(PI): The joint measure P ,n £ A^i(3^o,n X Xq,u)'- 

Po,n{Go,n) = {f^O )(G , n ),Go, n eB(X ,n)xB(y 0jn ) 

(Go 

where G 0i7liX n is the x n — section of G ,n at point x n defined by G 0tUiX n = {y n e 3^o,n : (x n , y n ) £ 
Go,n} and (g> denotes the convolution. 

(P2): The marginal measure i/ ,n £ A / (i(^o,n) : 

^0,71 (-^0,n) = -Po,n(^0,n x ^0,n)j -^0,n £ Wo 



90,n((<*b,n X i 71 0,n)x";a: n )/i0,n(^ n ) = / Qo,n (Fq^, X™) /i ,n • 

(P3): The product measure 7r 0in : B(X ,n) x B{y^ n ) i— >■ [0, 1] of // ,„ £ A^i^o^) and z/ , n G 
.MiCM for G ,n £ B(^ 0) „) x B(y ,«): 



^0,71(^0,71) — (l^0,n X Z^Cn ,n) — / ^0,n 

The precise definition of mutual information between two sequences of Random Variables X n 
and Y n , denoted I(X n ; Y n ) is defined via the Kullback-Leibler distance (or relative entropy) 
between the joint probability distribution of (X n , Y n ) and the product of its marginal probability 
distributions of X n and Y n , using the Radon-Nikodym derivative as follows. 

Definition IL4. Given a measurable space (X,B(X)), the relative entropy between two proba- 
bility measures P,Q G is defined by 

a , J x tes(%)dP = J x *>g(%)%dQ if P«Q 



B(P\\Q) 



+00 otherwise 
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where denotes the Radon-Nikodym derivative (density) of P with respect to Q, and P « Q 
denotes absolute continuity of Q with respect to P. 

Hence, by the construction of probability measures (P1)-(P3), and the chain rule of relative 
entropy lfT9l . the following equivalent definitions of mutual information are obtained. 



K^0,nlKn) (H.13) 

lQ S M \ d (tMi,n ® ?0,n) (II. 14) 

x . n xy , n V «lA t 0,n x v 0i1l ) J 

{d n dxn) g (^n) (al5) 



D(g ,n(-; x n ) | Kn(-)W(^ n ) 

). (11.16) 



Note that (11.16) states that mutual information is expressed as a functional of {//o,n, <?o,n} and it 



is denoted by I(p,o,n, Qo,n)- Note also that /xo,n®<7o,n <C /io,n x ^o,n if and only if g(-; x n ) <C ^o,n(-)' 



/i ,n — a.s., which is used to established that (11.14) is equivalent to (11.15). Necessary and sufficient 



conditions for existence of a Radon-Nikodym derivative for finitely additive measures can be 
found in J2D|. 

Next, the classical RDF [1J is introduced, since the definition of nonanticipative RDF will be 



based on the classical definition by imposing the nonanticipative constraint (1.4). 



Definition II.5. (Classical Rate Distortion Function) Let rf , n : Xo,n x 3^o,n — > [0, oo], be an 
B(Xo,n) x B(yo tn )-measurable distortion function, and let Qo, n (D) C Q(3^o,n; Xo,n) (assuming 
is non-empty) denotes the average distortion or fidelity constraint defined by 



Qo,n( D ) = (?o,n e 2(3^; #o,n) : / 



rfo,n(^ n ,l/ n )go,n(rf2/ n ;a; n ) ® IM),n(dx n ) < D 



<*0,n X^O.i 



(11.17) 

/or > 0. 77ze classical RDF associated with the anticipative kernel go,n £ Q(3^o,n; ^o,n) zs 
defined by 

Ro,n(D)= inf I(/i ,n,go,n). (H.18) 

<?0,n 6Q ,n(£>) 



Existence in (11.18) is shown by assuming do :n (x n , •) is bounded continuous on 3^o,n while 



3^o,n is compact, using weak-convergence of probability measures in [|2T|. and for more general 
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conditions d 0>n (x n ,-) which is only continuous on y ^ n using weak* -convergence of measures 
on Polish spaces ll22~ll . 

Unfortunately, for general sources and distortion function d 0tn (x n , y n ), the optimal reconstruction 
Qon(dy n ', x n ) = ®^ =0 q* (dx/i, y % ~ 1 ,x n ) is anticipative, and hence the link to filtering theory cannot 
be established due to dependence of yi on (y 1 ' 1 , x l ) and on future symbols (x i+ i, . . . , x n ). This 
raises the question whether the classical RDF can be reformulated so that the optimal recon- 
struction kernel is nonanticipative. Before the definition of nonanticipative RDF we introduced 
a Lemma which gives insight into how classical and nonanticipative RDF are related. 

The next lemma relates nonanticipative convolution reconstruction kernels and conditional 
independence. 

Lemma IL6. The following are equivalent for each n6l 
1) %,n{dy n ',x n ) = q 0n (dy n ;x n )-a.s., (see Definition 11.2-3). 



2) For each i = 0, 1, . . . , n — 1, Yi -H- (X*, Y^ 1 ) <H- (X i+1 , X i+2 , ... , X n ), forms a Markov 
chain. 

3) For each i — 0, 1, . . . , n — 1, Y l -h- X % -h- X i+ i forms a Markov chain. 

Moreover, X? +1 «-> X 1 Y\ forms a Markov chain for each i = 0,1, . . . ,n — 1, implies any of 
the statements 1), 2), 3). 

Proof: This is straight forward hence the derivation is omitted. □ 



According to Lemma II.6-1), for a restricted nonanticipative stochastic kernel the mutual 
information becomes 

I(X n ;Y n ) = [ log r °' n( ff If ] ) to,n(dy n ; dx n ) ® ^ n {dx n ) 

Jx 0>n xy ,„ v ^o,n{ay ) ' 

= I{fi , n ,to, n ) (11.19) 



where (11.19) states that I{X n ; Y n ) is a functional of {po,n, ^o„}. Hence, nonanticipative RDF 



is defined by optimizing I(/io,n, Qo,n) over qo : n^Qo,n(D) subject to the realizability constraint 
%,n{dy n ;x n ) = q 0n (dy n ;x n ) — a.s., which satisfies a distortion constraint. 

Definition U.7. (Nonanticipative Rate Distortion Function) Suppose d , n (x n ,y n ) = Yh=oPoj 
(x l , y % ), where po,? : ^x^j — > [0, oo], 15 a sequence ofB(X 0t i) x Biy^^-measurable distortion 
functions, for % = 0,1,. . . ,n, and let ~$o,n(D) (assuming is non-empty) denotes the average 
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distortion or fidelity constraint defined by 

<?o,nP) = Q , n (D) fl [q , n eQ(y , n ; X ,n) : q ,n{dy n ; x n ) = f 0tn (dy n ; x n ) - a.s.}(11.20) 

The nonanticipative RDF associated with the restricted nonanticipative stochastic kernel is 
defined by 

R c 0<n (D)= inf %„.n,go,n). (H.21) 
Clearly, Rq h (D) is characterized by minimizing mutual information or equivalently I(po >n , go, n ) 



over the Q 0iTl (D) and the nonanticipative constraint (1.4). In the work of IflQI . nonanticipative 
RDF is called e-entropy and nonanticipation is defined via X™ +1 -B- X 1 -h- Y\ forms a Markov 
chain for each i — 0, 1, . . . , n — 1, which implies q 0tn (dy n ; x n ) = ~(t 0)Tl (dy n ] x n ). 

III. Existence of Optimal Reconstruction Kernel 
In this section, appropriate topologies and function spaces are introduced and existence of 



the minimizing nonanticipative product kernel in (11.21) is proved. The construction of spaces 
is based on [|22l . 



A. Abstract Spaces 

Let BC(yo tn ) denote the vector space of bounded continuous real valued functions defined 
on the Polish space 3^o,n- Furnished with the sup norm topology, this is a Banach space. Denote 
by Li(p 0>n , BC(yo >n )) the space of all /i ,n-i n tegrable functions defined on X 0>n with values in 
BC(y 0t n), so that for each e Li(p, 0>n ,BC(yo tn )) its norm is defined by 

II IL,n= / \\<l>(x n ,-)\\BC(y , n )V0,n( dxn ) < 00 • 

The norm topology || </> || Mo n , makes Li(/z ,n, -BC(3^o,n)) a Banach space. The topological dual of 
BC(y 0iTl ) denoted by (^BC(y ^ n )^ is isometrically isomorphic to the Banach space of finitely 
additive regular bounded signed measures on 3^o,n 11231 . denoted by M rba (y 0tn ). Let H r ba(yo,n) C 
M r ba(yo,n) denote the set of regular bounded finitely additive probability measures on 3^o,n- 
Clearly if y 0n is compact, then (^BC(y ^ n )^ will be isometrically isomorphic to the space 
of countably additive signed measures, as in ll2iTl . It follows from the theory of "lifting" [24J 
that the dual of the space Li(p, 0<n , BC(y , n )) is L^(p, ^ n , M rba (y , n )), denoting the space of 
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all M rba (y 0:fl ) valued functions {q} which are weak* -measurable in the sense that for each 

G BC(^o, n ), x n — > q x ™{4>) — Jy (j)(y n )q(dy n ;x n ) is /i ,ri-measurable and yU^n-essentiaily 
bounded. 

B. Weak* -Compactness and Existence 

Next, we prepare to prove existence of solution to Rl n (D). Define an admissible set of 
stochastic kernels associated with classical rate distortion function by 

Qad = L^i^O ,ni rirba ,ni M rba 

CM)- 

Clearly, Q ad is a unit sphere in (/x ,„, M rba (y , n )). For each 0eLi(^ o ,n, BC(y , n )) we can 
define a linear functional on (// 0;n , M rba (y 0:Tl )) by 



This is a bounded, linear and weak* -continuous functional on (/x , n , ^fr&a(!Vo,ji)) as it is shown 
below. 



So given <fi e Li(/i , n , -BC(iVo,n)) 5 there exists a < oo such that ||^|| < c^. Therefore, £^ is 
a bounded, linear functional on (// ,n, n rba (^ 0i „)) and hence on (// , n , M rba (^ ,n))- Thus, 
it is continuous in the weak* -sense. 

For d , n : <*b,n x ^o.n ->■ [0,oo) measurable and d Qtn eLi(^, n , BC(y 0>n )) the distortion 
constraint set of the classical RDF is given by 





H0lUi( W) , n ,BC(yb,„)) < oo- 



Qo,n(£>) = {geQ«d:^, n (9o,n)<£>}. 
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The next result is shown in fl22); it utilizes the Alaoglu's theorem B3l , which states that a closed 
and bounded subset of a weak* -compact set is weak* -compact. These will be used to establish 
existence of minimizer in Q a d for the nonanticipative RDF Rq h (D). 

Lemma III.l. / I22l/ For G?o, n Gl/i(/Xo,n, BC(y 0jn )), the set Qo t n(D) is bounded and weak* -closed 
subset of Q a d (hence weak* -compact). 



Now we prepare to consider the problem stated in Definition II.7 First, we show weak*- 
compactness of Q a d defined as a subset of Q a( i as follows. 

~3ad = {%,n e Qad ■ qo, n (dy n ; x n ) = ~to, n ( d y n 'i x n ) - a.s.y 

The average distortion function for the nonanticipative RDF is defined by 

Qo, n ( D ) ~ {lo,n e Qad ■ 4*,,»(?o,n) - / (/ d 0>n (x n ,y n )q , n (dy n ; x n ) 

®^, n {dx n )<D}(\$ ad 
= \qo,n e Qad ■ Wn(<lo,n) = / (/ d 0>n (x n , y n )q , n (dy n ; x n \ 
®^ n {dx n ) < £>}, D > 0. 



Since we are interested in proving existence of nonanticipative RDF of Definition II. 7 we shall 
first show that ~Q a d 1S weak* -closed, and then utilize Lemma 
for ^ ad and then weak* -compactness of ~($o,n(D). 



III.l 



to establish weak*-compactness 



Lemma IIL2. Let X 0n and 3^o,n be Polish spaces and introduce the net {qfidy^y 1 1 ,x 1 )}, 
where a G (V, y), and qf G Q(yf, Xo,i)- Assume 

(a) qf (•; y*" 1 , x l ) ^» g?(-; y* -1 , z*) /or i = 1, . . . , n; 

(b) for all hi(-, -)£Li(p,i, BCtyi)) the function 



(x\y l L ) e Xo,i x y 0li -i \ — ► / / h l (y)q i (dy;y l \ x % )m(dxi', x % L ) 

JXi Jy t 

is Li(/i ,i-i, BC(y 0ii -i)) for i = 0,1, ... , n; 
(c) /or a// ■)GL 1 (/ij, BCtyi)) and V e > ?/iere exxsfs a > a e smc/i that 



sup 



hi(xi,yi)q?(dyi;y l ,x l ) 



y, 



- / h^x^y^q^dy^y 1 ,x l ) 
Jy t 



lii(dxi\x l ) < e, V x l G ^o.i-i 
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Then the convolution stochastic kernels converge in weak* -sense as follows. 

tin ^ tin (HI-22) 

e.g, the set ~f$ a d is weak* -closed. 

Proof: See Appendix. □ 
Next, we utilize the weak* -compactness of ~$ a d to show that ~($o,n(D) is also weak* -compact. 

Remark III.3. There are certain important cases in which c?o,n may not be bounded. This is the 
case when d n is a metric of a linear metric space. The next theorem is crucial in showing the 
weak* -closedness property of^o jn (D) to those distortion functions d 0i n which are not necessarily 
bounded, since they are measurable functions from the class do, n G -£i(/-*o,n, -BC(3^o,n))- 

Theorem IIL4. Let Xo,n, D^o.n be two Polish spaces and d 0tU : Ao^X^o.n l— >• [0, oo], a measurable, 
nonnegative, extended real valued function, such that for a fixed x n G X§ n , y n — > d(x n ,-) is 



continuous on 3V«> far [io )n -almost all x n G Xq >ti and suppose the conditions of Lemma 111.2 



hold. For any D G [0, oo), the set ~Q 0n (D) is a weak* -closed subset of~^ ad and hence weak*- 
compact. 

Proof: Let {fon) e VonP) c ~Q ad t> e a net - Since Q ad is weak* -compact, there exists 
a subnet of the net {~<ton}i relabelled as the original net, and an element ~^on e Q ad sucn 
that fo, n =^ "^*o,rJ^} We must show that ~$o,n e ^o,n(-^)- Considering the sequence {dQ„ = 
d 0jn Ak, k G N}, which are bounded, measurable functions (continuous in the second argument), 
it follows from the weak* -convergence of the sequence {"^Jj to "(ton that 



dl n (x n ,y n )t°oAdy n -,x n )UoAdx n ) 

(m.23) 

= lim ! ( ( dZ„(x n ,tf)tZ n (dy n ;x n ))^(dx n ) 
for each k G N. Since <i ,n is non-negative and d^ n t c?o, n as A; — > oo and ~<f q „ G Q 0n (D), we 



2 i.e.| 0(x",y")t?,„(dy n ^ n )Mo,n(d^) - f XQ n ./^ (f>(x n ,y n )t 0>n {dy n ;x n )^ , n (dx n )\ -> for any <^> £ 

Li(Ato,n;BC*(J ,n)). 



October 5, 2012 



DRAFT 



17 



have 



<^0,n \ J yO,n 



dg. (/ I /)^ fl (^;x' , )Vo,n(^) 



= lim/ / dl n (x n jy n )t a , n (dy n -,x n ))^Adx n ) 

< lim / C / d , n (x\ y n )t a Jdy n ; x")) /i ,n(^ n ) < D 

/A: 



which is valid for all k E N. Since d^ n y d 0n and they are non-negative, it follows from 
Lebesgue's monotone convergence theorem and non-negativity of stochastic kernels that 

doAx n iy n )t° Jdy n ;x n ))^ n (dx n ) < D. 



This shows that the weak* -limit lto n e ^onP) ant ^ hence we have proved that the set 
~^ Qn {D) is a weak*-closed subset of ad . By Alaoglu's theorem [|23l being a weak*-closed 
subset of a weak*-compact set, it is weak*-compact. □ 



Based on Theorem III.4 and lower semicontinuity of relative entropy, we show existence of 



the optimal reconstruction conditional distribution for nonanticipative RDF. 



Theorem III.5. Under the conditions of Theorem III.4 R,Q n (D) has a minimum. 



Proof: This follows from Theorem III.4 provided lower semi-continuity of I(/io,n, •) on 



Q ad is established. First we prove that if o,™ — > 'Hpo,™ ') is weak*-lower semicontinuous. Let 
{~(to,n} be a net from ~($ ad and suppose it is weak* -convergent to Vo,n- Define the net P a n E 
n r f )a (A'o ) „ x y 0) n) given by the convolution product P^ n = fi 0in (dx n ) <g) ~d^Q n (dy n ] x n ). Take any 
(f(-) E BC(X 0n x y 0tn ) and consider the expression 

/ <p , n (x n ,y n )P^ n (dx n ,dy n ) = j <p , n (x n ,y n )ltl n (dy n ;x n )®^ n (dx n ). 

Since =^=>- ~(f 0n in (/i ,n, n rba (3^o,n)) 5 it is clear from the above expression that 

Jo%^*o,» = /*o,»®"^U in W^xW. (111.24) 

Similarly one can easily verify that the net of the product measures {rcQ n } converges to the 
product measure 71-° 

a a _w* o 

^O.ra = U 0,n X ^0,n > V Q X /XQ,n = 7T Q „ 
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where {i/ *J are the marginals of {-Prf n } on y 0n and z/g n is its weak* -limit. Now we use the 
lower semicontinuity property of relative entropy |fl9l Lemma 1.4.3, p. 36]. Following [fT9l it 
is verified that the same procedure holds true not only for countably additive measures but also 
for finitely additive ones. Using this fact we conclude that 

D(iVlK„) < liminf H(P a n \\n« n ). 



By (11.13), this is equivalent to 

%o,n, <7o,n) < lim inf %o,n, ?o J • (111.25) 

This proves weak*-lower semicontinuity of I(fio >n , •) on £^ a d. We have already observed in 
Theorem III.4 that the set Qo,n(D) is weak* -compact, and we have just seen that I(fio, n , •) is 
weak*-lower semicontinuous. Hence I(fio n , •) attains its infimum on Q 0n (D). So there exists a 

t*0,n £ Qo.nP) SUCh that i2§ >n (D) = I(/i 0) n, ^o,„). □ 

IV. Necessary Conditions of Optimality for Nonanticipative RDF 

In this section the form of the optimal nonanticipative convolution reconstruction kernels is 
derived under a stationarity assumption. The method is based on calculus of variations on the 
space of measures [|25l . 



Assumption IV.l. The family of measures ~~c[ o^ n {dy n - } x n ) defined in (11.9), is the convolution of 
stationary conditional distributions. 



Assumption |IV.l holds for stationary process {(Xi,Yi) : i G N} and single letter distortion 



do,n(x n ,y n ) = X^ILo Pi x% i V 1 )- It a ^ so holds f° r distortion defined by p(T l x n ,T l y n ), where 



T l x n is the shift operator on x n (and similarly for T l y n ). Utilizing Assumption IV.l which 
holds for stationary processes and a single letter distortion function, the Gateaux differential 
of I(fiQ tTl , qo, n ) is taken at q [j n in the direction of q 0) „ — qon> ym me definition q q n = 
qo,n + e {~<to,n — ?on)' e e [0)1]' since under the stationarity assumption, the functionals 

{qi(dyi, ar*) G QQ^; 3>o,t-i x <*b,i) : * = 0, 1, . . . , n} are identical. 

Theorem IV.2. Suppose I MOi „(go,n) - %o,n, ? o,n) " well defined for every "tf ,n G L™(// ,n, 
n P 6a(3^o,n)) possibly taking values from the set [0, oo]. TTzerc yo,n ~~ I w „(7o,n) w Gateaux 
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differentiable at every point in Z/™(//o , n , n^^o ,n)), and the Gateaux derivative at the point 
q q n in the direction q 0) „ — ~<f o,n ^ gi ven by 

f f ( if (dv n ' x n ) \ 

^W(^0,n' ^0,n ~ ^g )B ) = / / tog . (?„,» ~ ^nX^A X»K„(dx") 

wherev^ n G A^i(3^o,n) w f^e marginal measure corresponding to ~^Q n ®^o, n G A^i(3^o,n x «^'o,n)- 
Proof: The proof, although lengthy, it is similar to the one in [|22l . hence it is omitted. □ 



The constrained problem defined by (11.21) can be reformulated using Lagrange multipliers. 
The equivalence of constrained and unconstrained problems is established in the following 
theorem. 

Theorem IV.3. Suppose d , n (x n ,y n ) = J2"=a P^x", TV), where d , n : X , n x ^o,n -> = 
[0, oo] w continuous in the second argument and the set T = {(x n ,y n ) G Af , n x 3^o,n : 
do,n(x n , y n ) < D} is nonempty. Then the constrained problem as stated in Theorem III. 5 is 
equivalent to an unconstrained problem stated below. 

inf I(jM), n , ~<to,n) = max inf {I(/i ,n, ~<to,n) ~ s£d 0n ( ~q o,n)} 



max inf \l(fi , n , ~<to,n) ~ s( / / d , n (x n ,y r 

To,« L V J^n „ /Vn „ 

~toJdy n ;x n )fi 0jn (dx n ) - D 



where ~(fo,n = ~~$o,n{dy n \x n ) = ® 1 i=Qqi(dyi,y l 1 ,x l )-a.s. Further the infimum occurs on the 
boundary of the set Q 0tTl (D). 

Proof: See Appendix. □ 



Utilizing Theorem |IV.3[ we can reformulate the constraint problem as an unconstrained prob- 
lem, hence we have 

Rl n (D) = sup inf h(jio, n , to,n) - s(£ don (to,n) — D)\. (IV.26) 
s<o ~qo,u { ■> 

Note that lto,n £ -^i(3^o,n) are probability measures on y 0>n therefore, one should introduce 

another set of Lagrange multipliers. 

Moreover, qo, n {dy n ] x n ) = ®i =0 qi{dyi] y % ~ x , x l ) is a consistent probability measure on y , n , 
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therefore for each k = 0, 1, . . . ,n, j y q ^(dy k ; x k ) = 1. This constraint is expressed via 



iy ,k 

n „ 

V / x t (x\ tf- 1 ) (foAdy 1 ; - 1 W^) 

i=0 •'<*b,i>O>0,i V 7 

n „ 

= 53/ \ i (x i ,y i ~ 1 )(lj*o,n(dy n ;x n ) - 1 j/i ,„((ia; n ) 

where {Aj(-, •) : i = 0, 1, . . . , n} are Lagrange multipliers. 



(IV.27) 



Utilizing the additional constraint (IV.27) in (IV.26), then we derive the optimal reconstruction 
kernel for the nonanticipative RDF R^ n {D). This is given in the following theorem. 



Theorem IV.4. Suppose d 0tn (x n ,y n ) = Y^=o P(T l x n , T l y n ) and the conditions of Lemma 111.2 
and Theorem III. 4 hold. Then the infimum in (IV.26) is attained at y o n e ^ooO^o.nj n r 6 a (^o,n)) 
given 



t* , n (dy n ;x n ) = ®^q*(d yi -y l -\x l )-a.s 

n e s ^ TV ^u:(d yi ;y l - v 



f y .e"* Tixn W>vt(dy i ;y i 



Tv s<0 



(IV.28) 



and v*{dyi\ y % x ) G <2(3^; 3^o,?-i)- The nonanticipative RDF is given by 

R C 0,n{D) =SD~ Eto L,^, , log ( Jy t f***™^. 

If R^ n (D) > then s < and 

[ [ P(T l x n , Vy n )fl t (dy l ; x>o,(^) = D. 



(IV.29) 



(IV.30) 



Proof: The fully unconstrained problem of (IV.26) is obtained by introducing another set 



of Lagrange multipliers {Aj(-, •) : i = 0,1, ... ,n} as in (IV.27). Using the pair of Lagrange 
multipliers {s, A = {Aj(-, ■) : i = 0,1, ... ,n}} introduce the extended pay-off functional 



J] / / A i (^,y i - 1 )fto,n(^ n ;a: n )-l)^(tte n ). 

j_Q •/ X(,^ n Jyo,n 



3 Due to stationarity assumption Ui(-; •) = •) and ?*(•;•> ') = •, •), V j = 0, 1, . . . , n. 
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This is a fully unconstrained problem on the vector space L™(// , n > M rba (y 0tn )). Utilizing The- 
the Gateaux derivative of T^ x on L™(/i ,n, M rba (y ^ n )) at any point lf[j n in the 



orem 



IV.2 



direction q 0<n — g§ n is given by 



1 Xo,n><yO,r 

—s 



"°0,n(dy T 



+ J2 X i (x\y i - 1 )(^ ,n-~q'l n )(dy n ;x n )fi 0!n (d 

1 = ^0,71X^0, n 



:x n ) 



Xi=o (sp(T i x n ,T i y n )+\ i (x n ,y i - 1 )) ~^0,n(dy n ; x\ 

"0,n(dy n ) 

(fo,n - tl n )(dy n ;x n )fx ,n(dx n ), V^O.n G (/i ,„,M r6o (^ 0l n))- 

Since I^ A (/x ,n, yo,n) is convex in ~(fo,n> it follows from the calculus of variations principle 
that a necessary and sufficient condition for yon t0 ^ e a minimizer is SI S ^ X ( q o, n ; q o,n — 
"^"o,n) = 0' V"^*o,n G L^(/i ,n, M-ba(yo,n))- Since the Gateaux derivative must be zero for all 

"^o,n e ^(fio^, M rba (y 0in )) then 



V, 



\ n {dy n 



a.s. 



Equivalently, 



q\[dyi\y % 1 ,x l ) = ^(T^T^HM^y- 1 )) _ fl g 

Since J yj q^dy^ y 1 " 1 , x' 1 ) = 1, then 

\ l {x\y^) = \o g f e^^v»{d Vl ^-\ i = 0,l,...,n 



Hence, 



O', 



t* Jdy n ;x n ) = ®^ Q q*{d yi ; y^ 1 , x l ) - a.s 



Since s < and A* > 0, i = 0, 1, . . . , n then G L™(// , n , n r6a (^ ,n))- Substituting "^g n 
into I^ A (/i 0in ,^ ,n) gives \lN.29\ . 

Note that for s = then R% n (D) = and n( d y n ; x n ) = ^ n {dy n ), /i ,n-almost all x n G Af , n . 
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This is trivial so we must have s < 0. From Theorem |IV.3| the solution occurs on the boundary 
of 3o,n(£>) giving \lV.30\ for s < 0. □ 



Often it is interest to identify conditions so that the optimal reconstruction is Markov with 
respect to {Xi : i = 0,1, . . . ,n}. The next remark discusses this case. 

Remark IV.5. Note that if the distortion function satisfies p{T l x n ,T l y n ) = p(xi,T l y") then 
according to Theorem IV.4\ we have 

q*{d Vl ;tf-\x l ) = q^dy^y^^i) - a.s., % 6 W (IV.31) 

that is, the reconstruction kernel is Markov in X n . However, even if p(T l x n ,T l y n ) = p[xi,yi) 
(single letter) one cannot claim that the optimal reconstruction distribution is also Markov 



with respect to {Yi : i — 0, 1, . . . , n} because the right hand side of (IV.28) does not satisfy 

Vi{dyi,y l - X ) = Vi(dyi,yi-x). 

V. Realization of Nonanticipative RDF 
The realization of the nonanticipative RDF (optimal reconstruction kernel and nonanticipative 



RDF) is equivalent to identifying the sensor mapping (see Fig. 1.2) which generates the auxiliary 
random process {Zj : % — 0, 1, . . . , n} so that the optimal reconstruction conditional distribution 
is matched from the output of the source to the output of the filter. This intermediate mapping 
consists of an encoder followed by a channel. Thus, the realization of the nonanticipative optimal 
reconstruction distribution consists of a communication channel, an encoder and a decoder such 
that the reconstruction from the sequence X n to the sequence Y n matches the nonanticipative 



rate distortion minimizing reconstruction kernel, Fig. V.3 illustrates a cascade of subsystems that 
realizes the nonanticipative RDF. For the single letter expression of classical RDF this is related 
to the so-called source-channel matching of information theory (26). It is also described in (27ll 
and [28] for control over finite capacity communication channels, since this technique allows 
one to design encoding/decoding schemes without inquiring encoding and decoding delays. The 
realization of the optimal reconstruction kernel is given below. 

Definition V.l. Given a source {Px^x^idxilx 1-1 ) : i = 0, . . . ,n}, a channel {-P^i^-i^^il 
6 i_1 ,a J ) : i = 0,...,n} is a realization of the optimal nonanticipative reconstruction kernel 
{q*(dyi, y l ~ x , x l ) : i = 0, . . . ,n} if there exists a pre-channel encoder {Pa^a^ 1 .B i - 1 ,x i (da^d 1 ^, 
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ui— 1 ™i 



x : 2 



0, ...,n} and a post-channel decoder {Py^y*- 1 ^{dy^y 1 : i = 0, ...,n} 



smc/i that 



g*(^;^ W) 



(V.32) 



where the joint distribution is 

Px n ,A n ,B n ,Y n (dx n , da n , db n , dy n ) 

= K=o p Y i \Yi-\Bi,A i ,xi{dVi\y i ~ l ^\ a\ x l ) 

® Pfl i |Bi-i^i,x',y*-i(d6i|6*~ 1 , a 1 , a:*, ® i 3 A i |A*-^y*-i,B*-i(<foi|a , ~' 1 , x\ y 1 ' 1 , 



(V.33) 



P 



X i |X i - 1 ^»- 1 ,B»- 1 ,l' < - 1 



l ,a* l ) - a.s. 



P Yl \Y^Mdyi\y l \b l )®P BilBl -i A .(dbi\b z \a l ) 



P 



A l \A i - 1 ,B 1 



;X i(dai\a l 1 ,W 1 , x l ) ® P Xt \x^(dxi\x l x ) 



a.s. 



The filter is given by {Px i \B i - l {dxi\b l x ) : i = 0, . . . , n}. 



Source 




Encoder 




Channel 




Decoder 


t 


k 







¥ ¥ 



Optimal 
Reconstruction 
Kernel 



Fig. V.3. Block Diagram of Realizable Nonanticipative Rate Distortion Function 



Thus, {P>i : i — 0, 1, . . . , n} is the auxiliary random process which is obtained during the 
realization procedure in order to define the filter {Px^B^idx^b 1 ^ 1 ) : % = 0, . . . ,n}. Note that 
unlike Bayesian filtering in which the auxiliary process represents the observations which are 
given a priori, in nonanticipative RDF this is identified during the realization procedure. In the 
previous definition, the following Markov chain assumptions are taken under consideration. 

1) (X\ A 1 ) ^ (Y l -\ ^ Yr, 
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2) (X 1 , Y 1 - 1 ) <-> (s*- 1 , A') <-> A; 

3) Y 1 - 1 o (A*" 1 , S*- 1 , X*) o A i; 

4) (A* -1 , S*" 1 , F^ 1 ) <r* X 1 - 1 <-> Xi. 

These conditional independent assumptions are natural since they correspond to data processing 
inequalities [2]. Thus, if {PB i \B i - 1 ,A i (dbi\b % ~ x ,a % ) : i — 0, . . . , n} is a realization of the nonantic- 
ipative RDF minimizing kernel {q*(dyi, y 1 " 1 , x % ) : i = 0, . . . ,n} then the channel connecting the 
source, encoder, channel, decoder achieves the nonanticipative RDF, and the filter is obtained 
via {P x . ]B i-i(dx i \b i ~ l ) :i = 0,...,n}. 



VI. Example 

In this section, we present the filter for Gaussian Markov partially-observable processes by 
utilizing the realization procedure of Section |Vj 

Consider the following discrete-time partially observed linear Gauss-Markov system described 
by 

( X t+1 = AX t + BW t , X = X,teN n (yi 
Y t = CX t + DV t , t e N n 
where X t E IR m is the state (unobserved) process of information source (plant), and Y t £ W 



Noise 


m% 
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cx t 
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DV. 
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Source (Plant) 





Encoder 

— z — 
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Channel 
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Fig. VI.4. Communication System 



is the partially observed (measurement) process. The model in (VI.34) consists of a process 
{X t : t E N 11 } which is not directly observed; instead what is directly observed is the process 
{Y t : t G N n } which is a noisy version of {X t : t 6 N n }. This is a realistic model for any 
sensor which collects information for underlying process, since the sensor is a measurement 
device which is often subject to additive Gaussian noise. Hence, the objective is to compress 
the sensor data. Assume that (C,A) is detectable and (A,BB tr )^ is stabilizable, (D ^ 0) |fT6l . 
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The state and observation noise {(W t ,V t ) : t G N n } are mutually independent, independent of 
the Gaussian RV X , with parameters N(xo,Vq), where W t G M fc and V t G E d , are Gaussian 
IID processes with zero mean and identity covariances. 



The realization will be done following Fig. VIA The objective is to reconstruct {Y t : t G N n } 
by {Y t '■ t G W 1 } causally. The distortion is single letter defined by 



n ~n\ A 



1 " 

i=0 



ml?- 



inf I(P Y 



The objective is to compute 

Ro,n(D) = 

(D) 

where £^o,n(-D) = {i^ y „| r „ : P{<io,n(^' 1 , ^ n )} < D}, and then realize the reconstruction 
distribution. The reconstruction of {X t : t G N n } when it is fully observed, e.g., when Y t = X t , 
is realized over a scalar additive white Gaussian noise (AWGN) channel in 0, while the partially 
observed scalar reconstruction of [Y t : t G N n } is realized over a scalar AWGN channel in Il27ll 
via indirect methods (utilizing upper bounds which are achievable). 

Here, the objective is to consider the vector process Y t G W and realize it over a vector 
AWGN channel. The methodology is based on the explicit formulae of optimal reconstruction 
of Theorem |IV.4 According to Theorem IV.4 t the optimal reconstruction is given by 



J i=0 



4dyi\f- v 



s < 0. 



(VI.35) 



Hence, from (VI.35) it follows that P Y ^ Y i-i Y % 



yi\y i ~ 1 Xi 



(dyi\y l 1 ,y i )—a..s., that is the re- 



construction is Markov with respect to the process {Yi : i G N n }. Moreover, since the 
exponential term \\yi — yi\\ 2 in the RHS of (VI.35) is quadratic in (yi,yi), and {Xi : i G N™} 
is Gaussian then {(Xj,Fj) : i G N n } is jointly Gaussian, and it follows that a Gaussian 
distribution P Y .\yi-i y ■{'\v l ~ 1 i Vi) (f° r a fi xe( l realization of (y' l ~ 1 ,y i )), and Gaussian distribution 



Py.\yi-^ 



■\y 



r,i-:n 



can match the left and right side of (VI.35 ). Therefore, at time t G N n , the output 



Y t of the optimal reconstruction channel depends on Y t and the previous channel outputs Y l 1 , 
and its conditional distribution is Gaussian. Hence, it has the general form 

Y t = AY t + BY 1 ' 1 + Z t , teN n (VI.36) 

where A t G W xp , B t G IR pxip , and {Z t : t G N n } is an independent sequence of Gaussian 



vectors. The channel in (VI.36) can be realized as follows. 
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The communication channel ( |VI.36[ ) can be realized via an additive Gaussian noise channel with 
feedback defined by 



B t = A t + Z t , t e W' 



(VI.37) 



where the encoder is a mapping A t = § t (Yt,Y ) with power P t = TraceE{A t A t t 1 "}. For A t 
Gaussian the mutual information is [2] I(A l ; B l ) = log \I + E{A t Af}Cov{Z t )~ l \. The decoder 
at time t E N receives B l and computes the reconstruction Y t = ^! t {B l .Y 1 ^ 1 ) . 
Realization of Nonanticipative RDF. The realization is based on the block diagram of Fig. |VL5 



The encoder consists of a pre-encoder which produces the Gaussian innovation process 

{K t : te N n }, defined by 



Kt^Yt-E^YMY*- 1 }}, teN" 



(VI.38) 



whose covariance is defined by A t = E{K t Kl r }. The decoder consists of a pre-decoder {K t : t E 
N n } which is defined by 



Kt^Y-E^YtlaiY 1 - 1 }}, t G N n . 



(VI.39) 



Note that the fidelity criterion satisfies d 0tn (y n ,y n ) = G?o,n(& n , k n ) = ^-j- Yli=o — h\\ 2 - First, 
we show that I{P Y n, P Yn[Yn ) = J^to ((# (^il^ -1 )) " (H^K*- 1 ,^ 



By (VL35>, 



Hence, 



n 

I(P Yn ,fy nlYn ) = {[H{Y\Y l - v )) - (H(Y i \Y i ~ 1 , Y{ 

i=0 



Since entropy is translation invariant, utilizing (VI.39) gives 

H{Y i \Y i ~ 1 ) = H(K i \Y i ^ 1 ) 

= H{K i \Y^,Y ,...,Y i _ 1 ) 



= H(k i \Y- 1 ,Y ,...,Y i ^ 2 ,K i - 1 ) 



H{ki\k 



(VI.40) 
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and repeated application of step (a) gives (VI.40). 



Similarly, HiY^Y^ 1 , Yj) = H{K i \Y i ~ 1 : Ki) = H^K^ 1 ,^). Hence, I(Pyn, ^ n|y „) = 

Now let {E t : t G W 1 } be the unitary matrix that diagonalizes {A t : t G N n }, such that 

E t A t El r = diag{\ t , u . . . X t>p }, t G N". (VI.41) 

Define T t = E t K t and T 4 = E t K t . Then {T t : £ G N n } is an orthogonal process. Moreover, 

d , n (K n ,k n )=do, n (T n ,t n ) = ^EILolir-fil^and/^;^!^- 1 ) = /(r,; f^P" 1 ), V* G 

Choose : £ G N n } such that 

r A j if 6 ^ \,i _ TO . 

5t,i= < , £GN n , i = 2,...,p 

^ A t ,< if 6 > A M 

where {& : £ G N} n satisfies 6 tA = D. 

Then by Q, 

has a solution 

Pf„| r „(rf7 n |7 n ) = ®r=o^ r * ifi^TilTi) - 
where P* ~ (-|-) ~ iV^r^AO. ^ = (1 - i = 0,1,..., p, and B^^{D) = 

^TI Eti lo S te)- Thus, the pre-encoder can be further scaled by T t = E t K t , and T t is 
compressed by A t = AtT t , and sent through an AWGN channel with feedback (shown in 



Fig. VI.5), after which the received signal is decompressed by T t = B t B t in the pre-decoder. By 
the knowledge of the channel output at the decoder, the mean square estimator X t is generated 
at the decoder since X t — E^X^afe 1 (one may also use a{B 1 ^ 1 } to find the filter of 



{X t : £ G N n }). The complete design is illustrated in Fig. VI.5 Next we pick a specific AWGN 
channel, which may be a vector or a scalar channel. 

Vector AWGN Channel. Consider a vector channel B t = A t + Z t , £ G N", where Z t is Gaussian 
zero mean, Q = Cov(Z t ) = diag{q\, q 2 , ■ ■ ■ , q p }, and A t G W. Define A t = diag{8 t ^, . . . , 8 t , p }, 
H t = diag{ijt,i, • • • , Vt, P } e R pxp , and r) t ,i = % = 1, . . . ,p. We design {(A, B t ) : £ G N"} 

by 



A = y/QA; l H t , B t = y/H t A t Q-\ t e N n . (VI.42) 
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Fig. VI.5. Design of Discrete-Time Communication System 



Therefore, 

f t = B t B t = B t (A t + Z t ) = B t (A t T t + Z t ), T t = E t K t 

= HtEtKt + BtZ t , t e N n . 
By pre-multiplying T t by Ef we can construct 

K t = E\ r V t = E» H t EtK t + E*BtZ t , t e N n . 

The reconstruction of Y t is given by the sum of K t and CX t as follows. 

Y t = VtiB*,?*- 1 ) 

= Kt + CXt, X t = E{x t \a{Y t - 1 } 
= El r H t E t K t + EfB t Z t + CX t , t e K' 



(VI.43) 



(VI.44) 
(VI.45) 



Next, it will be shown that the desired distortion is achieved by the above realization while the 
reconstruction of {Y t : t G N N } is {Y t : t G N n } given by (jVLA5\ . 
First, we notice that 

E[{Y t - Y t ) tr (Y t - Y t )} = trace[E[{Y t - Y t ){Y t - Y t ) tr }). 
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Then we can compute 

E[{Y t - Y t f{Y t -Y t )} = TraceE{(K t - K t ){K t - i^} 
= TraceE{(K t - E*r t )(K t - £f f t )' r } 

= Trace£{(K t - E l t r H t E t K t - E l t r B t Z t ){K t - E l ;H t E t K t - E t t r B t Z t ) tr ^ 

= TraceE{ ((/ - El r H t E t )K t - E l ;B t Z t ) ((/ - E^ H t E t )K t - £f B t ^)* r } 

= Trace{(/ - Ef H t E t )k t (I - E?H t E t ) tr + E?B t QB?E t } 

= Trace[(I - Ef r H t E t )Ef'diag ( A t ,i , . . . , A ttP )^(I - E*E t E t ) tr + Ef r B t QBf'E t ^ 

= Trace[E\ r ({I - H t )diag{\ u X t , p )(l - H t f + (BtQBT))^} 



This shows that the realization of Fig. VI.5 achieves end-to-end average distortion equal to D. 
Decoder. The decoder is Y t = K t + CX t , where {X t : t G N"} is obtained from the modified 
Kalman filter as follows. Recall that 

Y t = K t + CX t 

= El r H t E t (Y t -CX t )+El r B t Z t + CX t 

= El r H t E t (CX t + DV t - CX t ) + E\ r B t Z t + CX t 

= Et r H t E t CX t - E?H t E t CX t + CX t + (E\ r H t E t DV t + E?B t Z t ) (VI.46) 

where {V t : t G N n } and {Z t : i G N n } are independent Gaussian vectors. Then X t = 
i?{X 4 |a{y* -1 }} is given by the modified Kalman filter 

X t+1 = AX t + CX t + AZ t (E t t r H t E t C) tr M- 1 Y t ,X = x (VI.47) 
Et+i = AT, t A tr -AT lt {E t ;H t E t C) tr M-\E t t r H t E t C)T Jt A + BB t t r , S = S (VI.48) 

where 

M t = El r H t E t CE t (El r H t E t C) tr + E^ H t E t DD tr (El r H t E t ) tr + E? B t Y, t B\ r E? . 

Infinite Horizon. As t — )■ oo, under the assumption that the linear Gauss-Markov system is 
stabilizable and detectable, we have 



= AE^A* - AZ x (E*H O0 E O0 C) tr M- 1 (E*H O0 E O0 C)Z O0 A + BB* 
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where 

Mqo = E t ^H 00 E 00 CT loo (E t ^ H 00 E 00 C) tr + E t ^ H 00 E 00 DD tr (E t £ i H 00 E 00 ) tr + E t c ^B 00 T loo B t ! l E 1 j r 
and Eoo is the unitary matrix that diagonalizes Aoo given by 



E^A^E^ = diag(X 00>1 , . . . , A 



t,p) 



and 



<W = S . > 1 = h ■ ■ ■ >P 

satisfying ££ =1 ^ = D. 
Define 

= diagiS^i, . . .,5^), = diagfr^i, . . . ,r]oo, P ) 
where ?7oo,i = 1 — t 2221 . The nonanticipative RDF can be computed as follows. 



R C (D) = lim inf - — -I(P Y h ^ytiyt 

P*t^t(dy*m<E0 o . t (D) t + 1 



'rtiytWIi^eCo.tP) 

1 1 y^. / A M 

t-Too I 2 t + 1 ^ ° g V S t i 
\ 1=1 

p -X 



lim 



1 \ ^ , / Aoo.i 



1=1 

1. A 



, - oo,« 



log-^. (VI.49) 

2 ^oo 

The power constraint satisfies TraceE{A t A t t r } = P t , lim^oo Pt — P- Since A t = A t E t K t the 
capacity of the channel including the encoder but without the decoder (e.g., between {A t E N} 
and {B t : t G N}) is 

G = \\m^I{A t ^B t ) = \\og\\m-^-\I + E{A t A t ;}Q- l \ 

= ilog lim-^-II + ^AOg^H Il gJM = jRC ( D ). (VI.50) 

2 <->oo t + l z | Aqo | 

Thus, for a given distortion level D, C = R C (D) is the minimum capacity under which there 
exists a realizable filter for the data reconstruction of {Y t : t £ N} by {Y t : teN} ensuring an 
average distortion equal to D. Finally, the filter is the steady state version of ( |VI.47[ ), ( |VI.48[ ). 
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VII. Conclusion 

This paper investigates nonanticipative RDF on abstract spaces. Existence of the optimal 
reconstruction conditional distribution is shown, while closed form expression is derived for the 
stationary case. The relation between filtering theory and nonanticipative rate distortion theory 
is discussed via a realization procedure. Finally, an example is presented which illustrates the 
realization of the nonanticipative RDF. 



Proof of Lemma 



III.2 



VIII. Appendix 

To show closedness of Q ad as a subset of Q a d it suffices to show that 



This will be shown by induction. Consider n = 0. For any h (x ,y ) e Li(fj, , BC(y )), by 
definition of weak* -convergence it follows from (a) that 



lim / h (xo,yo)q^(dy ;x )no(dxo) = h (x ,yo)q (dy ;x )no(dx ). 
'x xy Jx xy 



Consider n = 1. For h (-,-) e Li(/x , BC{y Q )), h x {-,-) G BC(y{)) We need to show 

that 



lim 

a— >oo 



ho(x 0l y )[ j j h 1 (x 1 ,y 1 )q^(dy 1 ;yo,x 1 ))fii(dx 1 ;xo))qo(dy ;x ))fio(dx ) 
x \Jy \Jx! \Jyi 



h (x ,y )[ / ( / h 1 (x 1 ,yi)q1(dy l ;yo,x 1 ))^ 1 {dxi;xo))qQ{dy ;xo))no{dx ) 
x \Jy KJx! KJy! 

The latter equation is written as follows. 

ho(x ,y )( / ( / htfa, ?/i y , x 1 ) )n 1 (dx 1 ',x ) )q%(dy ;x ) )ii (dx ) 

x \Jy \Jx 1 \Jy 1 

ho(xo,y )( / ( / h 1 {x 1 ,yi)q^(dyi;y ,x l ))ii 1 (dx 1 ;xo))qo(dy ;x ))iJ l0 (dxo) 
'x \Jy \Jx 1 KJy! 



October 5, 2012 



DRAFT 



32 



< 



x \Jy 



ho(x ,y ) 



Xi \Jy! 



h^x^y^q^dy^yo^ 1 ) )fi 1 (dxx;x ) ) q%(dy ;x ) )(j, (dx ) 



v 

hl(x ,y ) 



+ 



h (x ,yo)( / ( / h 1 (x 1 ,y 1 )q^(dy 1 ;y ,x 1 ))fi 1 (dx 1 ;xo))qo(dyo;xo))iJ l o(dx ) 
x \Jy \Jx! \Jyi 



ho(x ,yo)( / / hi(x 1 ,y 1 )q%(dy 1 ;y ,x 1 ))iJ l i(dx 1 ;xo))q%dy ;xo))iJLo(dxo) 
x \Jy \Jx! \Jyi 



h 1 (x 1 ,y 1 )qi(dyi;y ,x 1 ) )fi 1 (dx 1 ; x ) ) <?o ^o) )/xo(Gfeo) 



h (x ,yo) 
'x \Jy \JX! \jy x 

We need to show that both right hand side (RHS) terms go to zero as a — > oo. Let e > be 

given. Then, there exists an a e G V such that for all a > a e the first RHS term can be written 

as 



x \Jy 



x \Jy 



ho(x ,y Q )h 1 (xo,yo)[q$(dyo;x ) - q (dy ;x )) )fi Q (dx 



ho{x ,yo)hi{x , y )(q%(dy ;x ) - q^dyo; x )) ) fi (dx ) 



< 



X 



Jo 



h (x ,yo)hi(x , y ) (q%(dy ;x Q ) - qo{dy ]x ) 



Ho(dxo) 



< e, V e > and Va > a f 



where the last inequality follows from condition (b), e.g., h (-, ■) e Li(fx , BC(y )). 
The second RHS term can be written as 



h (x ,y )l / ( / h 1 (x 1 ,y 1 )(q^(dy 1 ;y 0} x 1 )-q1(dy i ;yo,x 1 )))iJ, 1 (dxi;xo) 
x \Jy \Jx! \Jyi v 

V v 

hf(x ,y ) 



®qo(dy ;x ) )no(dx 



h (xo,yo)h"(xo,yo)qo(dyo;xo)no(dxo). 



(VIII.51) 



'x Jy 

By condition (c) for i = 1, and V e > and a > a e we have 



h 1 (x 1 ,yi)q"(dy 1 ;y ,x 1 ) - / h i (x 1 , yi)q°(dyi; y°, x l ) 
yi Jyi 



sup 



fjLi(dxr, x ) < e, V x e X 



Utilizing the last inequality into (VIII.51 ) yields that in the limit as a — > oo, then (VIII.51 ) goes 
to zero. 
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Next, suppose that for n = k and for all e > there exists a e E V such that for any a > a e 

ho(x ,y )...[ / ( / h k (x k ,y k )q k (dy k ;y k ~ 1 ,x k ))iJ lk (dx k ;x k ~ 1 ) 
\Jx k \Jy k J 



x \jy 
■ ■ -qoidyo^o) )ii (dx ) 



'xo \Jy 

. . . qo(dy ; x ) )fi {dx ) 



ho(x ,y ) . . . ( / ( / h k (x k ,y k )q° k (dy k ;y k 1 ,x k ))n k (dx k ;x k 
\Jx k \Jy k J 



< e. 



To conclude the derivation we need to show that for n = k + 1 



Consider n — k + 1. We need to show that for all e > there exists a e E V such that for any 

a > a e 

( / h (x ,yo) ■ ■ ■ [ / ( / h k+1 (x k+1 ,y k+1 )q k+1 (dy k+1 ;y k ,x k+1 ) )fi k+1 (dx k+1 ;x k ) 

xo \Jy \Jx k +i \Jy k +i J 

■ -<lo(dy ;xo) )iio(dx ) 



h (x ,y ) ■■■ i / ( / h k+1 (x k+1 ,y k+1 )q% +1 (dy k+1 ;y k ,x k+1 ) )n k+1 (dx k+1 ;x k ) 
Xo \Jyo \ J x k+1 \Jy k+ i 



■ ■ qo(dy ; x ) W^o) 



< e. 



Since, 



ho(x ,y ) . . . [ / ( / h k+1 (x k+1 ,y k+1 )q^ +1 (dy k+1 ;y k ,x k+1 ) )fi k+1 (dx k+1 ;x k ) 

x \Jy \Jx k +i \Jy k +i / 

• -qoidyo^o^jiioidxo) 

h (x ,y ).. 
1 Xo \Jyo V J x k+ i \Jy k +i 

■ ■ ■ qo(dy ; x ) )no(dx ) 



h k+1 (xfc+i , y k+ i) q° k+1 (dy k+1 ; y k , x k+1 ) J /i k+1 (dx k+1 ; x k ) 
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< 



® i=0 hi(xi, yi) ( / / h k+1 (x k+ll y k+1 )[q k+1 (dy k+1 ;y ,x 



k „k+l\ 



-q u k+1 (dy k+1 ;y k ,x k+1 ))fi k+1 (dx k+1 ;x k )\ ® k i=0 q? (dy^ y l \ x l ) ® ^(dx^ x l v ) 



,k „txt j„, . „.i—l „i\ 



+ 



® k =Q hi{xi,yi 



h k+ i(x k+1 , y k+ i)q° k+l (dy k+1 ] y k , x k+1 )fi k+ i(dx k+1 ; x k ) 



[qt(d yi ;y* \ x l ) - q^ (d yi ; y l \x l ) ) ® ^{dx^x 1 x ) 



By condition (c) the following inequality holds, \Jx k 6 X 0:k , 



sup 

y k ey ,k Jx k+1 



y k A 



h k +i(x k+ i, y k+1 ) ( q k+1 (dy k+1 ;y k , x k+1 )- 



k ,„fc+r 



q k+1 (dy k+1 ;y ,x 



< e, V e > and Va>a e 



L<>k+i(dx k+ i; x k ) 



Also, by condition (b), h k+1 e L^fiQ^, BC(y 0jk )). Utilizing the previous observations and the 
induction hypothesis ® h =0 qf(-; y 1 ' 1 , x l ) =^=> ®*L g°(-; y 1 ' 1 , x l ) in the two inequalities above, 
then in the limit as a — > oo, the terms in the inequality go to zero. 

As a result, ad is a weak*-closed set. Being a weak*-closed subset of the weak*-compact set 
Qad, Q ad 1S a l so weak*-compact. □ 



Proof of Theorem IV.3 The proof is based on Lagrange Duality theorem [25, Theorem 1, 
p. 224]. We choose X = (/io,n, M rba (y , n )) which is clearly a vector space. For the set ft 
the natural choice is the set = Q ad = (/i ,n ; n rba (3^o,n)) ^ X. Define 

= / (/ d 0:n (x n ,y n )t(dy n ;x n ))fi , n (dx n )-D. 

It is clear that G(-) is a convex mapping from L™(/z ,n, M r b a (yo, n )) into the real line with 
the natural ordering (R, -<) = Z. Also recall that q 0n — > l(no,n', ~<to,n) is convex and well 
defined on Q and that, by Theorem |lll.5l inf_* -a , m Muo n ] ^o«) exists and is finite. Thus, 
according to the Lagrange duality theorem referred to above, it suffices to show that there exists 
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a "if n E fi such that 

G(To, n ) = I if d , n (x n , y n )-tln{dy n \ x n )}fio,n(dx n ) - D < 0. 

Introduce the sets A\ = {x n & X§ n : T x n ^ 0} and A = X n \ A\, with Y x n denoting the 
^"-section of Y. Define the measure valued function "if n as follows 

tl >n (T xn ;x n ) = 0, V xeA ; f l 0)n (y , n ', x n ) = 1, V x n e X , n 

< tl, n (B;x n ) <1,BC IV, tj,„(r,n;x n ) = 1, V x n e A 1 

where 5 6 £>(3^o, n )- Since by hypothesis T 7^ we have /i ,n(^i) > and thus the kernel ~~ct\ n 
is well defined and it belongs to L^(/i ,nj ^-rba{yo,n))- Using this kernel in the expression for 
^o,nC^o,n)> one can easi ly verify that £d 0>n (~tl,n) < D and hence C7("if 0) „) < 0. Then, by the 
Lagrange Duality theory, we arrive at the conclusion of the theorem as stated. Also it follows from 
the same duality theory that if the infimum is achieved by some lt*o n e L^(fi Q ^ n ,ll rba (y^ n )), 
then 



s 



( [ [ d , n (x n , y n )T , n (dy n ; x n )VoAdx n ) - £M = 0. (VIII.52) 

In other words, for non-zero s G (— 00, 0], solution occurs on the boundary. This completes the 
proof. □ 
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